Background: Magnetic resonance imaging (MRI) studies typically employ either a single expert or multiple readers\r\nin collaboration to evaluate (read) the image results. However, no study has examined whether evaluations from\r\nmultiple readers provide more reliable results than a single reader. We examined whether consistency in image\r\ninterpretation by a single expert might be equal to the consistency of combined readings, defined as independent\r\ninterpretations by two readers, where cases of disagreement were reconciled by consensus.\r\nMethods: One expert neuroradiologist and one trained radiology resident independently evaluated 102 MRIs of the\r\nupper neck. The signal intensities of the alar and transverse ligaments were scored 0, 1, 2, or 3. Disagreements were\r\nresolved by consensus. They repeated the grading process after 3ââ?¬â??8 months (second evaluation). We used kappa\r\nstatistics and intraclass correlation coefficients (ICCs) to assess agreement between the initial and second\r\nevaluations for each radiologist and for combined determinations. Disagreements on score prevalence were\r\nevaluated with McNemarââ?¬â?¢s test.\r\nResults: Higher consistency between the initial and second evaluations was obtained with the combined readings\r\nthan with individual readings for signal intensity scores of ligaments on both the right and left sides of the spine.\r\nThe weighted kappa ranges were 0.65-0.71 vs. 0.48-0.62 for combined vs. individual scoring, respectively. The\r\ncombined scores also showed better agreement between evaluations than individual scores for the presence of\r\ngrade 2ââ?¬â??3 signal intensities on any side in a given subject (unweighted kappa 0.69-0.74 vs. 0.52-0.63, respectively).\r\nDisagreement between the initial and second evaluations on the prevalence of grades 2ââ?¬â??3 was less marked for\r\ncombined scores than for individual scores (P = 0.039 vs. P = 0.004, respectively). ICCs indicated a more reliable\r\nsum score per patient for combined scores (0.74) and both readersââ?¬â?¢ average scores (0.78) than for individual scores\r\n(0.55-0.69).\r\nConclusions: This study was the first to provide empirical support for the principle that an additional reader can\r\nimprove the reproducibility of MRI interpretations compared to one expert alone. Furthermore, even a moderately\r\nexperienced second reader improved the reliability compared to a single expert reader. The implications of this for\r\nclinical work require further study.
Loading....